Matrix momentum for practical natural gradient learning

نویسندگان

Silvia Scarpetta

Magnus Rattray

David Saad

چکیده

An on-line learning rule, based on the introduction of a matrix momentum term, is presented, aimed at alleviating the computational costs of standard natural gradient learning. The new rule, natural gradient matrix momentum, is analysed in the case of two-layer feed-forward neural network learning viamethods of statistical physics. It appears to provide a practical algorithm that performs as well as standard natural gradient descent in both the transient and asymptotic regimes but with a hugely reduced complexity.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Fast Curvature Matrix-Vector Products for Second-Order Gradient Descent

We propose a generic method for iteratively approximating various second-order gradient steps - Newton, Gauss-Newton, Levenberg-Marquardt, and natural gradient - in linear time per iteration, using special curvature matrix-vector products that can be computed in O(n). Two recent acceleration techniques for on-line learning, matrix momentum and stochastic meta-descent (SMD), implement this appro...

متن کامل

Adaptive natural gradient learning algorithms for various stochastic models

The natural gradient method has an ideal dynamic behavior which resolves the slow learning speed of the standard gradient descent method caused by plateaus. However, it is required to calculate the Fisher information matrix and its inverse, which makes the implementation of the natural gradient almost impossible. To solve this problem, a preliminary study has been proposed concerning an adaptiv...

متن کامل

True Asymptotic Natural Gradient Optimization

We introduce a simple algorithm, True Asymptotic Natural Gradient Optimization (TANGO), that converges to a true natural gradient descent in the limit of small learning rates, without explicit Fisher matrix estimation. For quadratic models the algorithm is also an instance of averaged stochastic gradient, where the parameter is a moving average of a “fast”, constant-rate gradient descent. TANGO...

متن کامل

A Hybrid Optimization Algorithm for Learning Deep Models

Deep learning is one of the subsets of machine learning that is widely used in Artificial Intelligence (AI) field such as natural language processing and machine vision. The learning algorithms require optimization in multiple aspects. Generally, model-based inferences need to solve an optimized problem. In deep learning, the most important problem that can be solved by optimization is neural n...

متن کامل

Handwritten Character Recognition using Modified Gradient Descent Technique of Neural Networks and Representation of Conjugate Descent for Training Patterns

The purpose of this study is to analyze the performance of Back propagation algorithm with changing training patterns and the second momentum term in feed forward neural networks. This analysis is conducted on 250 different words of three small letters from the English alphabet. These words are presented to two vertical segmentation programs which are designed in MATLAB and based on portions (1...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 1999

Matrix momentum for practical natural gradient learning

نویسندگان

چکیده

منابع مشابه

Fast Curvature Matrix-Vector Products for Second-Order Gradient Descent

Adaptive natural gradient learning algorithms for various stochastic models

True Asymptotic Natural Gradient Optimization

A Hybrid Optimization Algorithm for Learning Deep Models

Handwritten Character Recognition using Modified Gradient Descent Technique of Neural Networks and Representation of Conjugate Descent for Training Patterns

عنوان ژورنال:

اشتراک گذاری